Lexical Semantic Association between Web Pages – a Lexical Knowledge Based Method
نویسندگان
چکیده
The candidate confirms that the work submitted is her own and that appropriate credit has been given where reference has been made to the work of others i ACKNOWLEDGEMENT I would like to express my gratitude to my supervisor Eric Atwell, Bill Whyte for helpful commentary and suggestions and also to Clive Souter, George Demetriou who provided me with LKB dictionary and gave me good suggestions. I also would like thank my teachers Paul & who always encourage me. Without their constant support and encouragement, I can't fulfil this thesis. I want to express my respect and thankfulness to my dear parents. ii ABSTRACT The rich variety of knowledge available on the World Wide Web makes it an attractive target for data mining and also language processing. In this project, a linguistic method, which was developed to measure the lexical semantic association between two words, is adapted to the task of measuring the semantic similarity between web pages. This lexical knowledge based method could also be used for meaning trend representation or theme representation. Themes are connotative meanings separated or contained in the textual units, such as text, or web pages, and are difficult to represent quantitatively and properly. This work also tries to propose a scientific description of themes that are viewed from the point of lexical semantics. Once textual units have been tagged with Lexical Semantic Tags contained in the Lexical Knowledge Base, themes within the units can be generated. The semantic representation of each web page is a bag of words after tagging which is believed contains certain themes. A program implements the measurement of lexical semantic similarity between two web pages. Various experiments have been undertaken to test the impact of text distance, noisy words and text length. To assess the precision of the methodology, we compare the result of the system with existing commercial information retrieval system and human judgements. A theme space was created to support the evaluation.
منابع مشابه
Developing a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity
Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...
متن کاملUse of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملLinks tell us about lexical and semantic Web content
The latest generation of Web search tools is beginning to exploit hypertext link information to improve ranking 1, 2 and crawling 3–5 algorithms. The hidden assumption behind such approaches, a correlation between the graph structure of the Web and its content, has not been tested explicitly despite increasing research on Web topology 6–9. Here I formalize and quantitatively validate two conjec...
متن کاملAn investigation and comparison of lexical knowledge of deaf and hearing children
Abstract Objectives: The present study examines the lexical knowledge of deaf children in two age groups of 9-10 and 10-11 years old with two groups of normal hearing children of 9-10 and 10-11 years old. Method: This study is a casual-comparative study. The achievement of 16 deaf children (ages 9-10 and 10-11 years old) and 16 hearing children (ages 9-10 and 10-11 years old) were examined on...
متن کاملEnhancing Navigability in Websites Built Using Web Content Management Systems
Websites built using Web Content Management Systems (WCMSs) usually provide their users with three alternative access structures to surf their contents: indexes of categories, breadcrumb trails, and sitemaps. In addition, to find contents of his/her interest, a user can perform more or less advanced full-text searches. In this paper we propose an automatic approach to extend the navigation stru...
متن کامل